Long noncoding RNAs (lncRNAs) are >200 nucleotides in size, lack protein coding potential, and together represent ~2% of the human genome. LncRNAs control important biological processes like cell division, - differentiation and apoptosis. They do this by binding chromatin, RNA, and proteins to control gene expression, protein synthesis, mRNA splicing, and so on. Changes in lncRNA expression have been clearly implicated in malignant cell transformation including acute myeloid leukemia (AML).

As changes in protein coding gene expression are key to AML development we determined whether similar changes occur at the lncRNA level. First, we developed a bioinformatic pipeline for the discovery of unannotated intergenic lncRNAs in 898 AML samples from 2 pediatric (AML-05 and TARGET) and 2 adult (BEAT AML and TCGA) cohorts. First, StringTie was used for transcriptome assembly. Subsequently, annotated Ensembl, Gencode, LNCipedia and RefSeq transcripts were removed. To minimize false positive results, the analysis was limited to spliced and intergenic lncRNAs while transcripts overlapping within regions 2000 bp down- or upstream of annotated genes were excluded. Lack of coding potential of transcripts was assessed using CPAT and PLEK. Using this workflow, we identified 1560 novel intergenic lncRNAs in AML. These lncRNAs were spread across the human genome with similar expression levels as annotated lncRNAs. Their identification expanded the Gencode lncRNA library by 27%. To determine whether the new lncRNAs exhibit similar canonical transcription rules as known ones we performed RNA-, and CAGE-seq, and H3K4me1, -me3, and -K27ac ChIP on 6 primary KMT2A::MLLT3 AML samples. In these samples, 220/1560 novel lncRNA were expressed of which 60% showed overlap with at least one histone mark indicative of active promoters in line with what has been described for known lncRNA.

Next, we determined whether genetic AML subclasses show specific lncRNA expression patterns as observed for protein coding genes. We performed separate UMAP analyses of the 1000 most variable protein coding transcripts and lncRNAs (including 352 new ones) on all AML cohorts. For all projections we observed a very robust clustering pattern of samples according to genetic subclasses. Cases with CBFB::MYH11, KMT2A::MLLT3, RUNX1::RUNX1T1, PML::RARA and C/EBPA mutations were clearly separated, next to three different clusters of NPM1-mutated cases. Healthy bone marrow samples showed a higher degree of separation using lncRNAs compared to protein-coding genes. This suggests that lncRNA expression is more deregulated than protein-coding gene expression in AML. In line with the specific clustering of genetically defined AML classes, we identified sets of lncRNAs specific to these classes using weighted gene co-expression network analyses. Subtle yet discernible differences between adult and pediatric AML were observed. To determine how the identified lncRNA sets depend on the action of mutated transcription factors we analyzed lncRNA expression following dTag induced degradation of KMT2A::MLLT3 and retinoic acid induced PML::RARA degradation in publicly available AML models. This revealed that high lncRNA expression in these leukemia subclasses was normalized upon inactivation of the mutated transcription factors (p<0.01). Thus, specific lncRNA expression patterns characteristic of AML depend on mutated transcription factors. We conclude that lncRNAs show a similar degree of transcriptional changes as protein coding genes in AML and that mutated transcription factors are key to these changes. Given that lncRNA play key roles in a plethora of biological processes it will be important to determine whether changes in their expression contribute to AML development.

Disclosures

No relevant conflicts of interest to declare.

This content is only available as a PDF.
Sign in via your Institution